Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
نویسندگان
چکیده
In the so-called high dimensional, low sample size (HDLSS) settings, LDA possesses the “data piling” property, that is, it maps all points from the same class in the training data to a common point, and so when viewed along the LDA projection directions, the data are piled up. Data piling indicates overfitting and usually results in poor out-of-sample classification. In this paper, a novel approach to overcome the data piling problem is introduced. It incorporates variable selection into LDA. The underlying assumption is that, among the large number of variables there are many irrelevant or redundant variables for the purpose of classification. By using only important or significant variables we essentially deal with a lower dimensional problem. Experiments on both synthetic and real data sets show that the proposed method is effective in overcoming the data piling and overfitting problem of LDA while improving the out-of-sample classification performance.
منابع مشابه
Sparse Linear Discriminant Analysis with Applications to High Dimensional Low Sample Size Data
This paper develops a method for automatically incorporating variable selection in Fisher’s linear discriminant analysis (LDA). Utilizing the connection of Fisher’s LDA and a generalized eigenvalue problem, our approach applies the method of regularization to obtain sparse linear discriminant vectors, where “sparse” means that the discriminant vectors have only a small number of nonzero compone...
متن کاملSupervised Feature Extraction of Face Images for Improvement of Recognition Accuracy
Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...
متن کاملA Comparison of Methods for Group Prediction with High Dimensional Data
High dimensional data is the situation in which the number of variables included in an analysis approaches or exceeds the sample size. In the context of group classification, researchers are typically interested in finding a model that can be used to correctly place an individual into their appropriate group; e.g. correctly diagnose individuals with depression. However, when the size of the tra...
متن کاملModified linear discriminant analysis approaches for classification of high-dimensional microarray data
Linear discriminant analysis (LDA) is one of the most popular methods of classification. For high-dimensional microarray data classification, due to the small number of samples and large number of features, classical LDA has sub-optimal performance corresponding to the singularity and instability of the within-group covariance matrix. Two modified LDA approaches (MLDA and NLDA) were applied for...
متن کاملFeature Selection By KDDA For SVM-Based MultiView Face Recognition
Applications such as Face Recognition (FR) that deal with high-dimensional data need a mapping technique that introduces representation of low-dimensional features with enhanced discriminatory power and a proper classifier, able to classify those complex features .Most of traditional Linear Discriminant Analysis (LDA) suffer from the disadvantage that their optimality criteria are not directly ...
متن کامل